Method and system for supporting a system call and interprocess communication in a fault-tolerant,scalable distributed computer environment

ABSTRACT

A distributed computing system environment includes multiple CPUs, multiple non-shared memory spaces and a means for implementing system calls and interprocess communications. The system is both fault-tolerant and scalable in that agents exist independently in each non-shared memory space to handle interprocess connections between memory spaces.

REFERENCE TO PRIOR APPLICATION

[0001] This application is a Division of U.S. Application Ser. No.08/835,398, filed Apr. 7, 1997, titled “Method and Apparatus forSupporting A Select( ) System Call and Interprocess Communication In AFault-Tolerant, Scalable Distributed Computer Environment,” whichapplication is incorporated herein by reference and which applicationclaims the benefit of U.S. Provisional Application Ser. No. 60/024,769,filed Aug. 21, 1996.

BACKGROUND OF THE INVENTION

[0002] The present invention is related to the field of digital circuitsand to the field of instructions and program code for the operationthereof. More particularly, the present invention is related to datacommunication between digital circuits or between processes running onthe same circuit. The invention is further related to fault-tolerant andscalable distributed processor and distributed memory digital processingsystems.

[0003] This discussion of the present invention presupposes somefamiliarity with computer programming and software, particularly datainput and output (I/0), network communications, and interprocesscommunications. The present invention also presupposes some familiaritywith multiprocessor and nonshared memory computer systems, as discussedin co-assigned U.S. Pat. No. 4,228,496, which is incorporated herein byreference to the extent necessary to understand and enable theinvention. This discussion of the present invention also presupposessome familiarity with the UNIX and related operating systems and withthe well-known sockets protocol for enabling interprocess and networkcommunication.

[0004] In all cases, the glossary and specific examples given herein areintended to be illustrative of the invention but not limiting. It willbe apparent to anyone of skill in the art that the present invention maybe implemented in an unlimited variety of operating system environments.Therefore, the invention should not be limited except as provided in theattached claims.

[0005] In UNIX and other operating systems (OS's), processes use aselect( ) (or a similar) system call to inform the OS kernel that theyare interested in a particular resource for interprocess communication.A simple example would be a process that needs to wait for a particularresource to have data to read or to be available for a write. Ratherthan the process using CPU time to repeatedly query the connection todetermine if the connection is ready, the process may call select( ) andthen become dormant if the connection is not immediately ready. The callto select( ) registers with the OS kernel that the calling process needsto be awakened when the interprocess communication resource becomesready.

[0006] The select( ) OS call may be used by a process having openconnections to one or more sockets. A socket is a resource forinterprocess communication that is generally used between a user processand a special I/O process handling a network I/O protocol such as thecommon internet protocols TCP or IP. Sockets are generally implementedas a data structure within the OS memory space, and this memory space isaccessible to the OS, the I/O process responsible for delivering data tothe socket, and the user process that is communicating via the socket. Asocket data structure has associated with it all state informationnecessary to handle the interprocess communication and generallyincludes a pointer to a memory location for temporarily storing theactual data packets flowing between the user process and the I/Oconnection.

[0007] Select( ) also may be used on other OS data structures used forinterprocess communication such as pipes or FIFOs and for other openedI/O such as ttys, disk opens, and directory opens. While select( )performs a similar function no matter what type of data structure it iscalled on, the details of the select( ) implementation for differentdata structures may vary in different operating systems as describedmore fully below.

[0008] A good description of the select( ) system call and sockets canbe found in the reference book UNIX Network Programming, by RichardStevens, (section 6.13), Prentice Hall, 1990.

Background of Distributed Memory Environments

[0009] In operating systems that have a single central processing unit(CPU) or multiple CPUs with shared memory, select( ) can be easilyimplemented because the data structures representing sockets, pipes, orFIFOs in the system are contained in a single memory space, as shown inFIG. 1, and therefore can be directly accessible to every processrunning in that memory space, including the OS.

[0010] However, in non-shared memory distributed systems, such as thatdiscussed in U.S. Pat. No. 4,228,496, select( ) is more difficult toimplement because the information regarding the occurrence of differentevents on different data structures may be contained in a memorydifferent from that which holds the user process calling select and maynot be directly accessible even to the OS local to the process thatcalled select.

[0011] What is needed is a method for performing a select( ) functioneffectively in a distributed memory environment.

SUMMARY OF THE INVENTION

[0012] In one embodiment, a distributed operating system incorporating adistributed select function includes: a first agent running in a firstmemory space; a second agent running in a second memory space; a firstdata structure for interprocess communication residing in the firstmemory space, and a second data structure for interprocess communicationresiding in the second memory space. In this system, the first andsecond agents send and receive messages between the first and secondmemory spaces and provide a standard select function interface to userprocesses.

[0013] In another embodiment, a method in a computing environment havingmultiple CPUs and multiple memory spaces for facilitating communicationbetween a process in a first memory space and a process in a secondmemory space includes: creating a first agent process in the firstmemory space; directing to the agent a request by a first process forcreating an interprocess connection to a process in a second memoryspace; creating a data structure in the first memory space forfacilitating interprocess communication; transmitting a messagerepresenting the request from the agent to the second memory space;receiving the request at a second process in the second memory space;creating a data structure in the second memory space for facilitatinginterprocess communication; using the first and the second datastructures to communicate data with processes in the respective memoryspaces; and using the agent and the second process to communicate databetween the data structures. Notably, the first process is unaware thatit is running in a distributed memory space environment. Also, theprocess in the second memory space can be a second agent, a continuouslyrunning I/O process, etc.

[0014] In yet another embodiment, a computer readable medium in adistributed computing environment embodies program code withinstructions for facilitating communication between a process existingin a first memory space and a process existing in a second memory space,including instructions for: creating a first agent process in the firstmemory space; directing to the agent a request by a first process forcreation of an interprocess connection to a process in a second memoryspace; creating a data structure in the first memory space forfacilitating interprocess communication; transmitting a messagerepresenting the request from the agent to the second memory space;receiving the request at a second process in the second memory space;creating a data structure in the second memory space for facilitatinginterprocess communication; using the first and the second datastructures to communicate data with processes in the respective memoryspaces; and using the agent and the second process to communicate databetween the data structures.

[0015] In yet another embodiment, a method for facilitating a systemfunction call and interprocess communication in a distributed computersystem environment in which two processes are instantiated, each in itsown memory space that is associated with its operating system, and inwhich a first of the two processes calls a system function forestablishing interprocess communication with the second of the twoprocess. This method includes: instantiating an agent in each of thememory spaces, a first agent in the memory space of the first processand a second agent in the memory space of the second process; receivingthe system function call from the first process at its operating system;prompting the first agent, in response to the system function call, tocreate a shadow socket in the memory space of the first process;prompting the second agent, in response to a message from the firstagent about the system function call, to create a real socket in thememory space of the second process, the real socket receiving dataassociated with the system function call; transferring the data from thereal socket to the shadow socket; and prompting the first agent towake-up the first process when the I/O data is available in the shadowsocket, wherein the first socket is not aware that the second process isin a remote memory space. In this environment, one system function callis select( ) and, in one instance, the first and second processes areuser and input/output (I/O) processes, respectively. Hence, in thisinstance the real socket receives I/O data as well as connectioninformation including the I/O data destination (first memory space).

[0016] Advantages of the invention will be understood by those skilledin the art, in part, from the description herein. Also, advantages ofthe invention will be realized from practice of the invention disclosedherein.

Glossary

[0017] The following terms are used in this application in accordancewith the explanations below and also in accordance with their broadmeanings as understood in the art.

[0018] Connection—a means for communicating data between processes orbetween a process and an I/O device, such as a socket, a RAM file, adisk file, a directory, a pipe, a FIFO, a TTY, etc.

[0019] Event—an occurrence on a connection, such as a ready-to-read,ready-to-write, or exceptional condition.

[0020] File descriptor (FD)—An identifier local to a user process andshared by only that process and related (child) processes foridentifying a particular connection opened or created by that process.

[0021] I/O process—An input/output process for managing data input andoutput on some physical resources such as a network. Also called aserver process.

[0022] Kernel—The executing operating system executable code thatmanages the activity of all other processes.

[0023] Message—In a distributed CPU environment, data communicatedbetween two or more different CPUs or memory spaces over acommunications channel.

[0024] Operating System (OS)—The kernel and other callable functions andresources available to processes.

[0025] Process—a collection (generally a sequence) of executable programcode and associated state information that can independently execute andmay at any given time be either executing or dormant.

[0026] Socket—A data construct residing in a memory that allows oneprocess to communicate data with another process via a standardinterface.

[0027] User process—A process created to perform a user function. Alsocalled an application.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028]FIG. 1 is a block diagram illustrating a computing environment ofthe prior art in which a select( ) function may be utilized.

[0029]FIG. 2 is a block diagram illustrating a distributed computingenvironment embodying the invention.

[0030]FIG. 3 is a block diagram illustrating an agent in accordance withthe invention.

[0031]FIG. 4 is a flow chart of a method of creating an interprocessconnection according to the invention.

[0032]FIG. 5 is a flow chart of a method of accomplishing a distributedselect according to the invention.

[0033]FIG. 6 is a block diagram illustrating a generalized computingsystem upon which the present invention may be implemented.

DESCRIPTION OF THE PREFERRED EMBODIMENT Select( ) and Sockets in aSingle-Memory-Space Computer

[0034]FIG. 1 is a block diagram illustrating a computing environment ofthe prior art. This figure depicts a computer system 1, with multipleCPUs 5 a-b, a shared memory space 10, I/O ports 15 a-b with I/Oprocesses 22 a-c, user processes 24 a-c, sockets 26 a-b, and FIFO 27.

[0035] Processes shown in FIG. 1 include the operating system process20, also referred to as the kernel, I/O processes 22 a-c, and userprocesses 24 a-c.

[0036] Kernel 20 is a continually running process that manages theoverall operation of the operating system and resource usage by otherprocesses. I/O processes 22 a-c are each generally connected to aphysical I/O interconnection such as a network connection, disk drive,or video display, and manage data communications between the physicaldevice and other processes in the system. User processes 24 a-c areuser-invoked executable code that perform user functions.

[0037] Sockets 26 a-b and FIFO 27 are data structures residing in theshared memory space that are created by a process or the kernel tofacilitate interprocess communication.

[0038] As shown in FIG. 1, SOCKET_B is connected between I/O process 22b and user process 24 a. With SOCKET_B connected as such, data intendedfor USER_PROCESS_A can be received by I/O_PROCESS_B and stored at thememory location pointed to by SOCKET_B even when USER_PROCESS_A is notavailable. Whenever USER_PROCESS_A becomes available, or is awakened bythe kernel, it can read the data from SOCKET_B, without interruptingwhatever new action I/O_PROCESS_B is undertaking. SOCKET_B is generallyconstructed so that it can receive and store a large number of datapackets until a user process that wants data from the socket performs aread on that socket. For each read preformed, SOCKET_B delivers. andthen discards the next packet of data in the order it was received fromI/O_PROCESS_B.

Select( )

[0039] A process such as USER_PROCESS_A may call select( ) on SOCKET_Bwhen it wants to access SOCKET_B, but it is not sure that SOCKET_B isready for the type of operation desired. The select( ) call alerts OSkernel 20 that USER_PROCESS_A wishes to be awakened when SOCKET_B isready to perform the requested operation.

[0040] Stated more generally, a process may call a select( )-typefunction when it is waiting for one or more events to occur on one ormore connections. A process may go dormant after calling select( ) ifthe event does not happen immediately. In one specific implementation ofselect( ), as called on sockets, there are three types of events: (1) asocket is ready to have data read from it, (2) a socket is ready to havedata written to it, and (3) an exceptional condition relating to asocket is pending. Generally, exceptional conditions are defined onlyfor sockets, while read/write readiness is defined for most connections.

[0041] In a specific example, the process calling select( ) provides thefollowing parameters to OS kernel 20:

[0042] Bitmask for read;

[0043] Bitmask for write;

[0044] Bitmask for exception (for sockets);

[0045] Size of bitmask (number of bits);

[0046] Timeout; and

[0047] A variable for the select( ) return value.

[0048] Bits set in a bitmask identify the connections in which thecalling process is interested for that operation. If no connections ofinterest are ready, then the process is suspended until one of theconnections becomes ready or the timeout expires. The select( ) systemcall returns a count of how many requested connections are ready (thisvalue can be 0 if the timeout expires) and updates the bitmasks—if a bitwas set and the corresponding connection is not ready, then that bit iscleared to 0; if a bit was set and the corresponding connection isready, then the bit is left set to 1.

File Descriptors (FD's)

[0049] In general, in UNIX and similar systems, information regardingconnections requested by a user process are communicated between theuser process and the OS using File Descriptors (FD's). FD's aregenerally integer identifiers that are assigned when a user processexecutes a particular system call to establish a connection, such asopen( ) (on existing disk files, directories, etc.), socket( ), pipe( ),etc. FD's, once returned, are used for subsequent system calls likeread( ), write( ), close( ), or select( ) to identify the connection onwhich to perform the operation, and are used in bitmasks as describedabove. FD's in general are local to the process that requested them, butmay also be inherited by child processes and in some cases may be passedto other processes. OS 20 maintains a data structure regardinginformation about each process, the FD's defined for that process andthe connections indicated by those FD's.

Use of Shared Memory Space for Interprocess Communication

[0050] In the shared memory system shown in FIG. 1, interprocesscommunication takes place through the mechanism of the shared memory andis managed by OS 20. If OS 20 or any other authorized process wishes toknow the status of any one of the sockets 26 a-b or of any otherconnection resource, it simply reads the shared memory space where thosesockets reside and thereby acquires the relevant data.

Overview of Distributed Memory Space Implementations

[0051]FIG. 2 is a block diagram illustrating a computing environment 2in which a distributed select( ) and other interprocess communicationfunctions may be advantageously utilized according to the presentinvention. As in FIG. 1, this computer environment may include multipleCPUs 5 a-c, I/O ports such as PORT_1, user processes 24 a-d, socketssuch as 26 a, and pipes or FIFOs such as 27 a.

[0052] Computing environment 2 differs from 1 in that environment 2 isdistributed onto a number of nonshared memory spaces 10 a-c, with eachmemory space generally having its own OS kernel 20, 22, or 23. Datastructures, including resources for interprocess communication, residingin one memory space are not directly accessible to processes in adifferent memory space. For purposes of this discussion, datastructures, processes, and CPUs will be referred to as local to oneanother if they are directly associated with the same memory space andwill be referred to as remote to processes or CPUs connected with othermemory spaces.

[0053] According to the present invention, a user process such asUSER_PROCESS_A is not necessarily aware that it is running in adistributed memory environment; however, the invention allowsUSER_PROCESS_A to establish an interprocess connection with any otherprocess in the environment as though, from USER_PROCESS_A's perspective,all processes were running in the same memory space.

[0054] According to the invention, a user process, such asUSER_PROCESS_A, may in addition call a select( ) type function on openedinterprocess connections even when those connections are not local tothe calling process.

[0055] Computer environment 2 includes a number of mechanisms that allowthis, including communication channel 50 allowing CPUs 5 a-c tocommunicate data to each other. One mechanism for this CPU to CPUcommunication is a messaging system, such as the Guardian™ messagingsystem described in U.S. Pat. No. 4,228,496.

[0056] According to the present invention, computer environment 2 alsoincludes a number of agent processes in order to facilitate interprocesscommunications and implement a distributed select( ) function. Theseagent processes, such as 30 a-c, 32 a-c, 33 a-c, are created by each OSin order to manage connections over the distributed environment asdescribed below.

Establishing a Remote Interprocess Connection

[0057] The present invention may be further understood by consideringthe method, as illustrated by FIGS. 2 and 4, by which a connectionbetween two processes is established in a distributed memory environmentin accordance with the present invention. For the purposes of thisexample, assume that USER_PROCESS_A in memory space 10 a wishes toestablish a connection with I/O_PROCESS_A residing in a remote memoryspace 10 c. USER_PROCESS_A is not necessarily aware that it is runningin a distributed memory environment.

[0058] According to the present invention, the interprocess connectionwould be created as follows: USER_PROCESS_A calls a system function tocreate the connection to I/O PROCESS A (Step S2). In one specificembodiment, this system function might be a socket( ) call.

[0059] The system function call is received by the memory space 10 aoperating system, which is specifically designed to operate over adistributed environment (Step S4). The operating system alertsSOCKET_AGENT_1 that USER_PROCESS_A wishes to create a socket withI/O_PROCESS_A (Step S6). SOCKET_AGENT_1 creates a SHADOW_SOCKET_A localto USER_PROCESS_A (Step S8). Once created, this SHADOW_SOCKET_A, fromUSER_PROCESS_A's perspective, behaves just as a socket would in anondistributed memory environment.

[0060] SOCKET_AGENT_1 then sends a message to the remote memory space 10c via CPU_1 and the CPU bus 50 (Step S10). When this message is receivedat the remote memory space by I/O PROCESS_A, I/O_PROCESS_A creates thereal SOCKET A in its memory space for holding connection information andstoring data received on PORT_1 (Step S12). Whenever I/O data isreceived by I/O_PROCESS_A, I/O_PROCESS A checks SOCKET_A to determinewhat to do with that data (Step S14).

[0061] If SOCKET_A indicates that USER PROCESS_A is the destination forthat data, I/O_PROCESS_A or kernel 23 initiates a message again throughthe CPU to the memory space 10 a alerting the memory space 10 aprocesses to the presence of the data in SOCKET_A (Step S16). Accordingto one embodiment of the invention, I/O_PROCESS_A, like SOCKET_AGENT_1,is a process that is always running and is specifically aware it is in adistributed environment. According to one embodiment of the invention,one mode is defined wherein the data is immediately transmitted fromSOCKET_A to SHADOW_SOCKET_A and is stored there until it is read byUSER_PROCESS_A, and an alternative mode is defined where data is held atSOCKET_A until a read request is received from USER_PROCESS_A.

[0062] When a message is received at memory space 10 a, it is passed toSOCKET_AGENT_1, which is always running (Step S18). SOCKET_AGENT_1 thenexamines SHADOW_SOCKET_A to determine whether action needs to be takenin response to the message from I/O_PROCESS_A (Step S20). IfSOCKET_AGENT_1 determines that USER_PROCESS_A is waiting to receivedata, SOCKET_AGENT_1 can initiate a wake-up to USER_PROCESS_A, which canthen take the appropriate action (Step S22).

Distributed Select( )

[0063] The present invention further enhances a distributed memory spaceenvironment by providing a means for implementing a select( ) systemfunction in that environment. According to the present invention, agentssuch as 30 a-c perform a number of key functions in implementing adistributed select( ), as illustrated in FIG. 5.

[0064] According to one embodiment of the invention, an agent is passed,by its local OS, all select( ) calls from any local processes that referto remote connections (Step T2). The agent prepares messages to theremote connections which are transmitted over message bus 50 (Step T6).The agent process receives all notify messages from the remoteconnections in response to events specified by the distributed select( )call (Step T8).

[0065] The remote process or the agent managing the remote connectionkeeps track of which interprocess resources (or FDs) are of interest toa given user process, and which operations are of interest for a givenresource, and which CPUs are interested in a given resource/operation.

[0066] When an I/O process or a remote agent receives a request for anotify via a select( ) or other call when a particular FD is ready, ifthe FD is not ready, the I/O process or remote agent stores informationindicating that there is interest in the FD and which CPU is interestedin the FD and what event is of interest (Step T10). If the FD becomesready for the operation of interest, the I/O process sends readinessinformation to the requesting CPU (Step T12).

[0067] Although, logically speaking, the calling process (or OS kernelon behalf of the calling process) sends the message to the remoteprocess expressing interest in a given FD, according to the inventionthe response from the remote process is not necessarily returned to thecalling process because the calling process may have terminated or maytimeout on select( ) then exit normally. The present invention solvesthis problem by having all responses to select( ) returned to an agentprocess. The agent processes are always available to receive responsesand to take the appropriate action.

[0068] The agent process acts as a middleman between local processescalling select( ) and remote processes. An agent acquires the select( )messages built by an OS kernel and sends them to the remote process. Anagent receives select( ) readiness messages from the remote process. Anagent posts the information from the messages to data structures such assockets, adds FDs to the calling process' linked list of ready FDs, andwakes the calling process when a selected FD becomes ready.

[0069]FIG. 3 shows a block diagram of an agent process as an example ofan embodiment of the present invention. According to the presentinvention, an agent process includes a local interface for interactingwith the local operating system and local processes. According to oneembodiment of the invention, this local interface 130 receives allselect( ) calls from local processes, even those which do not requireany remote access. According to the present invention, when an agentdetermines from its select data base that a particular connectionoperation requires notification of a remote process, and that remoteprocess has not already been notified, the CPU upon which the process isrunning desires notification of that particular connection event, agentprocess 30 a sends a message via a remote interface 134 to a remoteprocess running in a remote memory space.

[0070] Once agent process 30 a receives any responses to requests it hassent to remote processes, it passes those responses to a distributor136, which determines which local processes need to be notified of theresponse and what action, such as waking up a local process, needs to betaken. Distributor 136 then uses a local interface to communicate suchresponses and take appropriate action.

[0071] Another aspect of the current invention is that, in order tomaintain consistency between socket function calls, the presentinvention may employ a separate shadow socket and real socket and use ofa socket agent to communicate between a user process and an I/O processeven when the user process and the I/O process reside within the samememory space.

[0072] In one specific embodiment of an environment incorporating theinvention, there are as many as three agent processes in eachdistributed memory space, one handling sockets, one handling pipes, andone handling FIFOs. Alternative embodiments could employ one agentprocess, or more than one, to handle different interprocesscommunication resources. For example, one agent process could handlepipes and FIFOS and a different agent process could handle sockets.

OS Kernel Code Functions in Distributed Select( )

[0073] In one specific embodiment of the present invention, OS kernelcode performs several functions to implement the distributed select( ).The kernel accepts and validates input parameters from processes callingselect( ) and gathers and formats return parameter information. It markskernel data structures to provide information needed by the local agentprocess. And it builds the select( ) messages which are sent to remoteprocesses.

[0074] The OS kernel in one embodiment invokes the system call “WAIT” tosuspend the select( ) calling process. The calling process is awakenedby the agent if an FD becomes ready, or by the OS kernel if select( )times out or is interrupted by a signal. If the calling process isawakened due to an FD becoming ready, it still needs to learn which FDof interest is ready and for which operation (read/write/exception). Oneoptimization of the present invention is to provide the agent with thelocation of a linked list of data structures containing informationregarding ready FDs. When a calling process is awakened, the OS kernel(on behalf of the calling process) will peruse this list to learn whichFDs are ready. The kernel checks whether these FDs are ones the callingprocess is interested in. If so, the kernel updates thecalling-process-provided bitmaps and returns control to the callingprocess. If not, the kernel suspends the calling process again and waitsfor FDs of interest to become ready.

Reducing Messages Between CPUs

[0075] A benefit of the present invention concerns how multipleinstances of the same connection or FD existing on the local CPU arehandled. As is known in the art, a calling process may fork( ) andcreate child processes which inherit the parent's open FDs, or thecalling process may call dup( ) and obtain multiple FDs which refer tothe same connection. In either of these cases, it is possible thatselect may be called such that interest is indicated for the same FDmultiple times. Both parent and child may call select( ) on an FDindependently, or a single process may call select( ) using multiple FDs that refer to the same open. In order to reduce message bandwidth onbus 50 and improve system performance, it is desirable to send only asingle select( ) message to the remote process indicating that aparticular FD and event on that FD are of interest.

[0076] According to the invention, when a second select( ) request comesinto the local OS kernel, that select is passed to the local agentprocess which checks against a database it maintains to determine thatthe remote process has already been contacted regarding that FD andtherefore there is no need to send another message to the remote process(Step T6). However, the agent process records that this second select( )call was made so that when the remote process sends a responseindicating that the FD has become ready, the response is distributed toall calling processes that called select( ) and/or to multiple duplicateFDs in the same process. Not sending unnecessary duplicate select( )requests to the remote process saves duplicate messages on bus 50 onboth the transmit and receive end of the select.

Fault Tolerance

[0077] The present invention provides a mechanism for increased faulttolerance of select( ) functions and interprocess communication in adistributed memory environment such as shown in FIG. 2. According tothis aspect of the invention, an agent process will, from time to time,check to see if remote processes to which it is maintaining connectioninformation are still active in a remote CPU. If a remote processterminates unexpectedly, the local agent will become aware of this andwill inform local processes that may be paused or asleep, waiting foraction by the remote process to awaken and to take appropriate action.According to this aspect of the invention, the distributed memory systemas shown in FIG. 2 is fault-tolerant in that one entire memory space andits associated CPU can fail, and the overall environment will continueto function and recover from the failure of one piece.

Scalability

[0078] The present invention also provides a means for increasedscalability in a distributed memory system such as that shown in FIG. 2.According to this aspect of the invention, any number of additionalmemory spaces with CPUs may be added to an integrated system, eachseparate space having its own set of agent processes to handle socket,FIFO, and pipe connections. The invention will work much the same wayregardless of the number of additional memory space environments thatare added to the system.

Variations of Implementation for FIFOs and Pipes

[0079] According to the present invention, interprocess communicationsmay be handled identically for different types of interprocessconnections such as sockets, pipes, or FIFO. However, the invention alsoallows for variations in the handling of different types of interprocessconnections in order to optimize performance.

[0080] As shown in FIG. 2, a separate agent process may be created ineach memory space for sockets, FIFOS, and pipes. According to anembodiment of the invention, a pipe agent such as 32 c may operatesimilarly to the socket agent previously described, with the differencebeing that, for a pipe connection, the pipe agent such as 32 ccommunicates with a remote pipe agent such as 33 c rather thancommunicating with a remote I/O process. In this case, the remote pipeagent 33 c may create a shadow pipe 27 b to communicate data locallywith the USER_PROCESS_C. Pipe agent communications such as those between32 c and 33 c take over the CPU system bus 50 as for sockets. Such agentto agent communications are not necessary for socket communication,because in the case of sockets, there is always a continuously runningremote l/O process that can create the remote real socket and can send areceive messages on the remote end. FIFOs and pipes are notautomatically associated with such a continuously running process, andso agent to agent communication is used.

[0081] However, the essential operation of the invention for sockets,pipes, and FIFOs is similar. In each case, an operating system accordingto the invention facilitates interprocess communications and the selectfunction by providing a continuously running process at both ends of aremote interprocess connection. In the case of sockets, on one end thiscontinuously running process is the socket agent and on the other end itis the I/O process itself. The I/O process, like the agents, isspecifically designed to operate in the distributed environment and isable to send and receive messages directed to the remote memory spaces.

[0082] It should be further understood that according to the presentinvention, a distributed select function may also be used with othertypes of files, including directories and disk files. These two filetypes are always ready for reading and writing and never ready forexceptions, so the implementation of select( ) for these files, even ina distributed environment, is trivial in that once select( ) determinesthe file type, the status is known.

Variations of Implementation for TTYs

[0083] According to another embodiment, a TTY file, also called thecharacter file, has select implemented in the same way as select iscalled on sockets. According to the invention, TTY select( ), similar tosocket select( ), relies on a central I/O process to keep thecontrolling data structures. However, in the case of TTY, I/O process A22 b would be a Telnet server, and is responsible for maintaining thestate of the data structures that hold the state of the open TTYconnections. The Telnet server is also responsible for communicatingwith a TTY agent process in each CPU to give back the status to theselect( ) and the agent process in each CPU wakes up processes waitingfor TTY select indications.

Contention Mode and Data Forwarding Mode

[0084] The present invention, in one embodiment, may provide for twodifferent modes by which data received by an I/O process such as 22 b istransmitted to a remote memory space.

[0085] The normal case is data forwarding mode in which, as soon as datais received by process 22 b, process 22 b examines the port address ofthe data and looks up the ultimate address of the data in SOCKET_A. From26 a, process 22 b determines that the data is destined for process 24 aon memory 10 a. Process 22 b then composes a message to SOCKET_AGENT_1,including the packet data, and sends the message over bus 50. When themessage is received by SOCKET_AGENT_1; SOCKET_AGENT_1 places the packetdata in the memory location specified by SHADOW_SOCKET_A, and then wakesup user process 24 a as appropriate.

[0086] The invention also may provide a contention mode forwardingstrategy. I/O process 22 b operates in contention mode when itdetermines that user processes in more than one memory space areinterested in reading data from a particular socket. In that case,packet data cannot be forwarded to a shadow socket, because I/O process22 b does not know what process will next do a read on the data. Incontention mode, the packet data is then stored at SOCKET_A, and amessage indicating that data is ready is sent over bus 50. When a read() is invoked by a local process such as 26 b, that read is forwarded bythe OS to the local socket agent, which then sends a message over bus 50to process 22 b to deliver a packet of data. In this way, the socketsprotocol of always delivering the next available packet to the next readrequest is preserved.

Invention on Computer Readable Media

[0087]FIG. 6 illustrates an example of a computer system that may beused to execute software embodying aspects of the present invention.FIG. 6 shows a computer system 700 which includes a monitor 703, screen705, cabinet 707, keyboard 709, and mouse 711. Mouse 711 may have one ormore buttons such as mouse buttons 713. Cabinet 707 is shown housing adisk drive 715 for reading a CD-ROM or other type disk 717. Cabinet 707also houses the multiple computer processors and memory spaces as shownin FIG. 2. According to one embodiment of the invention, the inventionmay be incorporated into operating system software or system utilitysoftware recorded onto a medium such as disk 717 which, when loaded intoan appropriate computer system causes the system to perform thedescribed method.

[0088] The present invention has been described with reference tospecific embodiments, but other embodiments will be obvious to personsof skill in the art. In particular, method steps are groupedfunctionally for the purposes of understanding the invention. It will beunderstood by those of skill in the art, however, that various methodsteps could be performed in different orders or could be placed indifferent functional groupings without changing the essential nature ofthe invention. The invention, therefore, should not be limited except asprovided in the attached claims.

What is claimed is:
 1. A distributed operating system incorporating adistributed select function, comprising: a first agent running in afirst memory space; a second agent running in a second memory space; afirst data structure for interprocess communication residing in thefirst memory space; and a second data structure for interprocesscommunication residing in the second memory space; wherein the first andsecond agents send and receive messages between the first and the secondmemory spaces and provide a standard select function interface to userprocesses.
 2. A distributed computing environment, comprising: aplurality of central processing units; a plurality of memory spaces; acommunication channel for communicating messages between the pluralityof central processing units; a first agent running in a first memoryspace; a second agent running in a second memory space; a first datastructure for interprocess communication residing in the first memoryspace; and a second data structure for interprocess communicationresiding in the second memory space, wherein the first and second agentssend and receive messages between the first and the second memory spacesand provide a standard select function interface to user processes.
 3. Amethod in a computing environment having multiple CPUs and multiplememory spaces for facilitating communication between a process in afirst memory space and a process in a second memory space, comprising:creating a first agent process in the first memory space; directing tothe agent a request by a first process for creating an interprocessconnection to a process in a second memory space; creating a datastructure in the first memory space for facilitating interprocesscommunication; transmitting a message representing the request from theagent to the second memory space; receiving the request at a secondprocess in the second memory space; creating a data structure in thesecond memory space for facilitating interprocess communication; usingthe first and the second data structures to communicate data withprocesses in the respective memory spaces; and using the agent and thesecond process to communicate data between the data structures.
 4. Themethod according to claim 3 wherein the first process is unaware that itis running in a distributed memory space environment.
 5. The methodaccording to claim 3 wherein the process in the second memory space is asecond agent.
 6. The method according to claim 3 wherein the secondprocess in the second memory space is a continuously running I/Oprocess.
 7. In a distributed computing environment, a computer readablemedium embodying program code with instructions for facilitatingcommunication between a process existing in a first memory space and aprocess existing in a second memory space, comprising: instructions forcreating a first agent process in the first memory space; instructionsfor directing to the agent a request by a first process for creation ofan interprocess connection to a process in a second memory space;instructions for creating a data structure in the first memory space forfacilitating interprocess communication; instructions for transmitting amessage representing the request from the agent to the second memoryspace; instructions for receiving the request at a second process in thesecond memory space; instructions for creating a data structure in thesecond memory space for facilitating interprocess communication;instructions for using the first and the second data structures tocommunicate data with processes in the respective memory spaces; andinstructions for using the agent and the second process to communicatedata between the data structures.
 8. A distributed computing system forfacilitating communication between a process existing in a first memoryspace and a process existing in a second memory space, comprising: meansfor creating a first agent process in the first memory space; means fordirecting to the agent a request by a first process for creation of aninterprocess connection to a process in a second memory space; means forcreating a data structure in the first memory space for facilitatinginterprocess communication; means for transmitting a messagerepresenting the request from the agent to the second memory space;means for receiving the request at a second process in the second memoryspace; means for creating a data structure in the second memory spacefor facilitating interprocess communication; means for using the firstand the second data structures to communicate data with processes in therespective memory spaces; and means for using the agent and the secondprocess to communicate data between the data structures.
 9. A method forfacilitating a system function call and interprocess communication in adistributed computer system environment in which two processes areinstantiated, each in its own memory space that is associated with itsoperating system, and in which a first of the two processes calls asystem function for establishing interprocess communication with thesecond of the two process, comprising: instantiating an agent in each ofthe memory spaces, a first agent in the memory space of the firstprocess and a second agent in the memory space of the second process;receiving the system function call from the first process at itsoperating system; prompting the first agent, in response to the systemfunction call, to create a shadow socket in the memory space of thefirst process; prompting the second agent, in response to a message fromthe first agent about the system function call, to create a real socketin the memory space of the second process, the real socket receivingdata associated with the system function call; transferring the datafrom the real socket to the shadow socket; and prompting the first agentto wake-up the first process when the I/O data is available in theshadow socket, wherein the first socket is not aware that the secondprocess is in a remote memory space.
 10. A method as in claim 9, whereinthe first and second processes are user and input/output (I/O)processes, respectively, such that the real socket receives I/O data aswell as connection information including the I/O data destination at thefirst memory space.
 11. A method as in claim 9, wherein the systemfunction call is select( ).
 12. A distributed computer system forfacilitating a system function call and interprocess communication inwhich two processes are instantiated, each in its own memory space thatis associated with its operating system and in which a first of the twoprocesses is configured to call a system function for establishinginterprocess communication with the second of the two process,comprising: means for instantiating an agent in each of the memoryspaces, a first agent in the memory space of the first process and asecond agent in the memory space of the second process; means forreceiving the system function call from the first process at itsoperating system; means for prompting the first agent, in response tothe system function call, to create a shadow socket in the memory spaceof the first process; means for prompting the second agent, in responseto a message from the first agent about the system function call, tocreate a real socket in the memory space of the second process, thesocket receiving data associated with the system function call; meansfor transferring the data from the real socket to the shadow socket; andmeans for prompting the first agent to wake-up the first process whenthe l/O data is available in the shadow socket, wherein the first socketis not aware that the second process is in a remote memory space.